perm filename 154EQ.TEX[1,RWF] blob sn#752250 filedate 1984-04-22 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\magnification\magstephalf
C00018 ENDMK
C⊗;
\magnification\magstephalf
\input macro.tex
\def\today{\ifcase\month\or
  January\or February\or March\or April\or May\or June\or
  July\or August\or September\or October\or November\or December\fi
  \space\number\day, \number\year}
\baselineskip 14pt
\def\fnc#1{\mathop{\rm #1}\nolimits}
\rm
\line{\sevenrm langeq.tex[v1,phy] \today\hfill}

\def\zap{\mathrel{\vbox{\lineskip1pt\baselineskip1pt
 \halign{\ctr{$##$}\cr
 \scriptscriptstyle L\cr \sim\cr}}}}
\def\zzap{\mathrel{\vbox{\lineskip1pt\baselineskip1pt
 \halign{\ctr{$##$}\cr
 \scriptscriptstyle L\cr \approx\cr}}}}
\def\zapp{\mathrel{\vbox{\lineskip1pt\baselineskip1pt
 \halign{\ctr{$##$}\cr
 \scriptscriptstyle L↓2\cr \sim\cr}}}}
\def\zappp{\mathrel{\vbox{\lineskip1pt\baselineskip1pt
 \halign{\ctr{$##$}\cr
 \scriptscriptstyle L↓3\cr \sim\cr}}}}
\def\footnote#1#2{#1botinsert{\hrule width5pc \vskip3pt\baselineskip9pt
 \hbox par size{\eightpoint#1#2}}}


\bigskip
\line{CS 154\hfill Spring 1983}
\line{Prof. R.W. Floyd\hfill}

\bigskip\noindent
If $L$ is a language over $\Sigma↑{\ast}$, it can be used to define two equivalence
relations.  The first relation makes two strings equivalent if they can be
used interchangably anywhere; the second if they can be used interchangably
at the beginnings of strings.  We call the first $\zzap$, and the
second $\zap$.

We define $\zzap$ (infix equivalence) by saying $x\zzap y$ if replacing an $x$ by a
$y$, or a $y$ by an $x$, in any string of $L$, results in a string of $L$; in
a formula
$$(x \zzap y)≡(∀u,v\in\Sigma↑{\ast})(uxv\in L≡uyv\in L).$$
Since replacing $x$ by itself does not change it, $x \zzap x$, and
$\zzap$ is reflexive.  By the symmetry of $≡$ in the definition,
$x \zzap y≡y \zzap x$, and $\zzap$ is symmetric.  If
$x \zzap y$ and $y \zzap z$, then $uxv\in L⊃uyv\in L⊃uzv\in L$, etc., so
$\zzap$ is transitive.

We define $\zap$ (prefix equivalence) by saying $x \zap y$ if replacing an $x$ by a
$y$ or a $y$ by an $x$ 
at the beginning of a string of $L$ results in a string of $L$; 
in a formula, $$(x \zap y)≡(∀v)(xv\in L≡yv\in L).$$
As with $\zzap$, $\zap$ is reflexive, symmetric, and transitive; both 
are equivalence relations; $\zzap$ is often called {\sl congruence}.  They have
some closure properties under concatenation:

\medskip
\disleft 20pt:(1):If $x \zzap y$ and $u \zzap v$, then $xu \zzap yv$.

\smallskip
\disleft 20pt:(2):If $x \zap y$, and $u \zzap v$, then $xu \zap yv$.

\smallskip
\disleft 20pt:(3):If $x \zap y$, and $u\in\Sigma↑{\ast}$, then $xu \zap yu$. 
(A special case of (2).)

\smallskip
\disleft 20pt:(4):If for all $u$, $ux \zap uy$, then $x \zzap y$.

\medskip\noindent
By substituting $\epsilon$ for $u$ in the definition of $\zzap$, we see that
$x \zzap y$ implies $x \zap y$; that is, $\zzap$ is
{\sl stronger\/} than $\zap$, or $\zap$ is {\sl weaker\/} than $\zzap$.

As with all equivalence relations, $\approx$ and $\sim$ (we omit the $L$ if no
ambiguity arises) imply a partition of $\Sigma↑{\ast}$ into equivalence classes.
If $C$ is an equivalence class, and $x$ and $y$ are two members of $C$, then
$x≡y$, so $x≡u$ implies $y≡u$, etc., and the members of $C$ can, for many
purposes, be used interchangably.  
Let $\hbox{Cong}(x)=\{\,y\mid x\approx y\,\}$, and
$\hbox{Eq}(x)=\{\,y\mid x\sim y\,\}$.  Let {\bf E} be the set of equivalence
 classes 
and~{\bf C} the set of congruence classes.  We call the number (finite or
infinite) of equivalence classes the {\sl index\/} of an equivalence relation.

\vfill\eject

\noindent
{\sl Theorem\/}: A language $L$ is an FAL if and only if $\zzap$ has finite index.

\medskip\noindent
Proof:

\smallskip
(1)\xskip FAL implies finite index. Let $M$ be a 
DFA$(Q,\Sigma ,\delta,q↓0,F)$ accepting   
$L$. For any string~$x$ in~$\Sigma↑{\ast}$, 
let $\delta ↓x$ be the function $Q→Q$ defined
by $\delta ↓x(q)=\delta(q,x)$; if $n=|Q|$, there are at most $n↑n$ such functions.
If $\delta ↓x$ and $\delta ↓y$ are the same function, then $x\approx y$; in fact,
$$uxv\in L≡\delta ↓v\bigl(\delta ↓x\bigl(\delta ↓u(q↓0)\bigr)\bigr)
\in F≡\delta ↓v\bigl(\delta ↓y\bigl(\delta ↓u(q↓0)\bigr)\bigr)
\in F≡uyv\in L\,.$$  
If we could find more than $n↑n$ non-equivalent
strings, their transition functions would all be different, which is not possible
(pigeonhole principle).

(2)\xskip Finite index implies FAL.  Let $L$ be a language, 
$\zzap$ of finite index.
Define $M=\bigl(\hbox{\bf E},\Sigma,\delta,$
$\hbox{Eq}(\epsilon),\ F\bigr)$, 
where $\delta$  is defined by $\delta\bigl(\hbox{Cong}(x),\ a\bigr)
=\hbox{Cong}(xa)$, and 
$F=\{\,\hbox{Cong}(x)\mid x\in L\,\}$.

\smallskip
\noindent
For the student: confirm that $\delta$ is well-defined, and that $M$ 
accepts~$L$.

\medskip\noindent
{\sl Theorem\/}: A language is an FAL if and only if $\zap$ has finite index.

\disleft 20pt:(1):FAL implies finite index.  Let $M$ be as usual.  
Let $q(x)=\delta(q↓0,x)$.
If $n=|Q|$, there are at most $n$ different values of $q(x)$.  If $q(x)=q(y)$,
then $x\sim y$; in fact
$$xv\in L≡\delta\bigl(q(x),v\bigr)\in F≡\delta\bigl(q(y),v\bigr)\in F≡yv\in L\,.$$
If we could find more than $n$ non-congruent strings, their values of $q(x)$
would all be different, which is not possible.

\disleft 20pt:(2):Finite index implies FAL.  Left as an exercise.

\medskip\noindent
{\sl Theorem\/}: The minimum number of states required by a DFA for $L$ is the
index of $\zap$; left as an exercise.

\medskip\noindent
{\sl Example\/}: Let $L↓1=\{A↑iB↑i,i≥1\}$.  Then $x \zzap y$ if $x=y$, and also if:

\smallskip
\disleft 20pt:(1):both contain $BA$

\smallskip\noindent
or

\smallskip
\disleft 20pt:(2):$x=A↑iB↑j,y=A↑kB↑l,i≥1,j≥1,k≥1,l≥1,$ and $i-j=k-l$.

\smallskip\noindent
Also, $x \zap y$ if $x=y$, and also if

\smallskip
\disleft 20pt:(1):each contains $BA$ or has more $B$'s than~$A$'s.

\smallskip\noindent
or

\smallskip
\disleft 20pt:(2):$x=A↑iB↑j,y=A↑kB↑l,i≥j≥1,k≥l≥1$, and $i-j=k-l$.

\medskip\noindent
{\sl Example\/}: Let $L↓2$ be the set of palindromes over $\{A,B\}$. 
Then $x \zzap y$, and $x \zapp y$, only if $x=y$.

\smallskip\noindent
Languages $L↓1$ and $L↓2$ have infinite index, so they are not FAL's.

\medskip\noindent
{\sl Example\/}: Let $L↓3$ be the set of strings over $\{A,B\}$ containing $ABA$.
Then $x \zappp y$ if:

\smallskip
\disleft 20pt:(1):Both contain $ABA$ or

\smallskip
\disleft 20pt:(2):Neither contains $ABA$, and each ends in $A$, or

\smallskip
\disleft 20pt:(3):Neither contains $ABA$, and each ends in $AB$ or

\smallskip
\disleft 20pt:(4):Neither contains $ABA$, and each is $\epsilon$, 
or $B$, or ends in $BB$.

\vfill\eject

\noindent
Thus there are four equivalence classes, corresponding to this minimal $FA$:

\figbox 1.5truein:

\noindent
Hard exercise: define $\zappp$.

Because $x\approx y$, $u\approx v$ implies $xu\approx yv$, we can define
concatenation on equivalence classes; $\hbox{Cong}(x)\oplus 
\hbox{Cong}(y)=\hbox{Cong}(xy)$.

\medskip\noindent
{\sl Exercise\/}: show this uniquely defines $\oplus$. (We will omit 
$\oplus$ when we 
feel like it.)  The equivalence classes of a FAL form a finite semigroup under
concatenation (i.e., $\oplus$ is associative), with identity element 
$\hbox{Eq}(\epsilon)$.
Conversely, if $\Sigma$ is a set of elements of a finite semigroup, with 
operation~$*$, 
and $S$ is any subset of the semigroup, there is an FA that reads
$a↓1a↓2\cdots a↓n\epsilon\Sigma↑{\ast}$, accepting iff 
$a↓1*a↓2\cdots *a↓nεS$.
The FA
need only keep track of the semigroup product of the characters it has read.

Summing up, we have a large set of equivalent definitions of FAL-ness:

\smallskip
\disleft 20pt:(1):$L$ is accepted by a DFA

\smallskip
\disleft 20pt:(2):$L$ is accepted by a NFA

\smallskip
\disleft 20pt:(3):$L$ is regular (defined by a regular expression)

\smallskip
\disleft 20pt:(4):$L$ has a prefix equivalence relation of finite index

\smallskip
\disleft 20pt:(5):$L$ has an infix equivalence relation of finite index

\smallskip
\disleft 20pt:(6):There is a homomorphism $h$ from $\Sigma↑{\ast}$ to a 
finite semigroup $S$ where
$x\in L$ if and only if $h(x)$ belongs to a particular subset of $S$.

\medskip\noindent
{\sl Theorem\/}: $L$ is an FAL if and only if there is a number $n$ such that
every string of length $n$ is equivalent to some shorter string.

\medskip
Thus, for $L$ an FAL, we can give a finite list of replacement rules $(x,y)$,
$|x|>|y|$, such that to test whether $z\in L$, we make allowed replacements until no
more are possible, and then test for membership in a finite set.

\medskip\noindent
{\sl Example\/}: $L↓3$ as defined above.  A sufficient set of replacement rules is:

\noindent
$(BABA,ABA)$

\noindent
$(ABAB,ABA)$

\noindent
$(BBB,BB)$

\noindent
$(AA,A)$

\noindent
$(ABBA,A)$

\vfill\eject

\noindent
{\sl Corollary\/}:

If for each $n≥0$ there is a string of length $n$ not infix equivalent for $L$ to
any shorter string, then $L$ is not a FAL.

\medskip\noindent
{\sl Corollary\/}:

If $L$ is not an FAL, then there is an infinite sequence of characters
$a↓1a↓2a↓3\cdots$, for which the finite prefixes $a↓1a↓2\cdots a↓n$ are all infix
inequivalent for L.  Any program to recognize $L$ must go into a new state
at each character as it reads this sequence.  Then for each $n$, there is some
input of length $n$ that makes it use at least $\lceil \lg (n)\rceil$ 
bits of storage.

\vfill\eject\end